Improvements in machine translation for English/iraqi speech translation
نویسندگان
چکیده
In this paper, we describe techniques for improving machine translation quality in the context of speech-to-speech translation for significantly different language pairs. Specifically, we explore three broad approaches for improving translation from English to Iraqi and vice versa. First, we investigate normalization techniques which address the differences in spoken and written forms of both languages. Second, we incorporate additional knowledge sources into the translation process such as a bilingual lexicon and named entity detection. Third, we exploit the rich morphological structure of Iraqi Arabic using two different approaches. The first approach decomposes words in Iraqi Arabic whereas the second approach, a novel one inflects English by combining key phrases into words using the minimum descriptive length criterion. Significant gains in accuracy are observed, while translating from text as well as speech recognition output.
منابع مشابه
Colloquial Iraqi ASR for speech translation
In this paper we describe a real-time speech recognition system developed for colloquial Iraqi Arabic. This system is currently used in our speech-to-speech translation system configured for bi-directional communication in English and Iraqi on a laptop. We present experimental results on Iraqi utterances from different speech-to-speech translation domains, and analyze the usefulness of acoustic...
متن کاملRecent advances in SRI'S IraqCommTM Iraqi Arabic-English speech-to-speech translation system
We summarize recent progress on SRI’s IraqCommTM Iraqi Arabic-English two-way speech-to-speech translation system. In the past year we made substantial developments in our speech recognition and machine translation technology, leading to significant improvements in both accuracy and speed of the IraqComm system. On the 2008 NIST-evaluation dataset our twoway speech-to-text (S2T) system achieved...
متن کاملFixed Length Word Suffix for Factored Statistical Machine Translation
Factored Statistical Machine Translation extends the Phrase Based SMT model by allowing each word to be a vector of factors. Experiments have shown effectiveness of many factors, including the Part of Speech tags in improving the grammaticality of the output. However, high quality part of speech taggers are not available in open domain for many languages. In this paper we used fixed length word...
متن کاملA Wearable Headset Speech-to-Speech Translation System
In this paper we present a wearable, headset integrated eyesand hands-free speech-tospeech (S2S) translation system. The S2S system described here is configured for translingual communication between English and colloquial Iraqi Arabic. It employs an n-gram speech recognition engine, a rudimentary phrase-based translator for translating recognized Iraqi text, and a rudimentary text-tospeech (TT...
متن کاملBuilding an English-iraqi Arabic machine translation system for spoken utterances with limited resources
This paper presents an English-Iraqi Arabic speech-to-speech statistical machine translation system using limited resources. In it, we explore the constraints involved, how we endeavored to mitigate such problems as a non-standard orthography and a highly inflected grammar, and discuss leveraging existing plentiful resources for Modern Standard Arabic to assist in this task. These combined tech...
متن کامل